Sampling Variability – The Heart of Inference

Salaries of football coaches

Sampling Strategies

What types of samples could we collect? Are some methods “better” than other methods?

At your table…

First

  • each person samples 10 salaries
  • calculate the median

Then

  • calculate the median of all 25 salaries

Each table has a sample of 25 UC & CSU coach salaries.


Would you feel comfortable inferring that the median salary of your sample is close to the median salary of all UC & CSU coaches?


Why or why not?

Why sample more than once?

Variability is a central focus of the discipline of Statistics!

Making decisions based on limited information is uncomfortable!

You likely weren’t willing to infer the population median salary from your sample!

Sampling Framework

population – collection of observations / individuals we are interested in

population parameter – numerical summary about the population that is unknown but you wish you knew


sample – a collection of observations from the population

sample statistic – a summary statistic computed from a sample that estimates the unknown population parameter.

Statistical Inference

There were 252 “Head Coaches” at University of California and California State Universities in 2019 (that satisfied my search criteria)


Median salary for all 252 coaches

$137,619

Inferring information from your sample onto the population is called statistical inference.

Statistical Inference Reasoning

  • If the sampling is done at random
  • the sample is representative of the population
  • any result based on the sample can generalize to the population
  • the point estimate is a “good guess” of the unknown population parameter



Shouldn’t one random sample be enough then? Isn’t that what we use to make confidence intervals and do hypothesis tests?

Virtual Sampling

rep_sample_n(coaches, 
             size = 25, 
             reps = 1, 
             replace = TRUE)


Employee Name Job Title Total Pay & Benefits
Beau Baldwin Asc Head Coach Crd 4 708408
Stein Metzger Intercol Ath Head Coach Ex 191728
Jordan Wolfrum Intercol Ath Head Coach Ex 76597
David Bradley Kreutzkamp Head Coach 5 105683
Daniel Dykes Head Coach 5 540000
Daniel Conners Head Coach 5 156181

\(\vdots\)

Distribution of 1000 medians from samples of 25 coaches

Sampling Distributions

  • Visualize the effect of sampling variation on the distribution of any point estimate
    • In this case, the sample median
  • We can use sampling distributions to make statements about what values we can typically expect.

Be careful! A sampling distribution is different from a sample’s distribution!

Distributions of 1000 medians from different sample sizes

What differences do you see?

Variability for Different Sample Sizes

Sample Size Standard Error of Median
25 19343.969
50 12459.358
100 8279.311
  • Standard errors quantify the variability of point estimates

  • As a general rule, as sample size increases, the standard error decreases.

Careful! There are important differences between standard errors and standard deviations.

A good guess?

Precision & Accuracy

  • Random sampling ensures our point estimates are accurate.


  • Larger sample sizes ensure our point estimates are precise.

Sampling Activity!